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I. INTRODUCTION 


A. BACKGROUND 

A database is a computer based record keeping system that 
contains information used to support an organization's 
tactical (short range) and strategic (long range) goals. For 
example, a database for a sales organization could contain 
customer, employee, sales and inventory data. 

Several data models are available to organize the 
information within the database so that it can be utilized in 
an efficient manner. One of the most common data models is the 
relational model. This method organizes data in terms of 
tables (relations), rows (tuples) and columns (attributes). 

Tables can be classified as either base tables or views. 
A base table is a table that physically exists in its own 
right. A view maybe thought of as a virtual table, in as much, 
that it does not (normally) exist within its own right but is 
instead derived from one or more underlying base tables 
[Ref. 1]. The view is stored as a definition in the data 
dictionary and is combined with a user's query to retrieve the 


requested data from the base tables. 


The use of views allows for the structuring and limiting of 
the information retrieved by a given query. This feature 
allows the user to receive data that is relevant to the 
application and limits unauthorized user access to other 
critical data. 

Recently several proposals have considered storing some 
form of the processed view to eliminate the need to evaluate 
the view definition from scratch every time it is queried. The 
first approach, known as full materialization, stores the 
fully processed view as a physical table. This approach has 
the advantage of increasing the efficiency of the queries on 
the view , but incurs an additional expense of maintaining the 
materialized view. To overcome this problem, a second 
approach, called semi-materialization, was proposed whereby a 
partially processed rather than a fully materialized view is 
stored. This approach redundantly stores data that represents 
selections and projections of individual relations, thus 
allowing efficient evaluation of the view definition while 
being easy to maintain. 

View performance processing is directly related to the 
performance of real time applications such as surveillance 
systems which support military operations. These systems 
receive periodic environmental updates from various sensors 
which are used to evaluate a view. Any delay processing the 
sensor data, which is typically time sensitive, into usable 


information could render the information late and unusable. 


Faster view processing used in conjunction with real time 
Systems will significantly improve the response time of these 


Systems. 


B. OBJECTIVE 

The objective of this thesis is to compare empirically the 
performance of three view processing strategies: query 
modification, semi-materialization and full materialization. 
The research attempts to verify the analytical results which 
have indicated that, in general, the semi-materialization 
strategy is the best method for processing general expression 
views [Ref. 2]. To accomplish this goal, this research 
develops a Data Generation Program to produce test databases 
according to user specifications. The test databases are then 
used to compare the performance of three view processing 
strategies for two view expressions and under different 
parameter settings, using a simulation program that was 
developed by Lt Jesse South  [Ref. 3]. Performance 
results were then collected, analyzed and plotted for 


presentation in this thesis. 


SCOPE AND METHODOLOGY 

This thesis accomplishes the following: 

1. Develcps a generic database generating program using 
ANSI C to generate test databases according to user 
specifications. 

2. Compares the performance of three view materialization 
strategies for select-project-join expressions with the 
database stored in Random Access Memory (RAM) and hard 
disk. 

3. Tests the three view strategies using general 
expressions with the database stored on RAM and on 

hard disk under different parameter settings, collecting 
the results and comparing them with analytical results. 
4. Uses the results to draw conclusions and determine the 


conditions under which each strategy performs the best. 


ORGANIZATION OF STUDY 


This thesis is organized as follows. Chapter II overviews 


the three view processing strategies. Chapter III provides a 


detailed description of the Data Generation Program. Chapter 


IV presents the performance results of the empirical study and 


compares them to the results of the analytical study. Chapter 


V presents conclusions based on the study and suggests areas 


for future research. 


II. VIEW MATERIALIZATION STRATEGIES 
The purpose of this chapter is to provide a general 
overview of the three view materialization strategies - query 
modification, semi-materialization and full 


materialization. 


A. QUERY MODIFICATION 

The conventional method for view processing for queries is 
query modification. This method stores a view definition in 
the data dictionary. This view definition is retrieved from 
the data dictionary when a query is issued on the view and 
combined with the user query into an equivalent query on the 
underlying base tables. This query is subsequently processed, 
and the results returned to its user. Consider the following 
database schema: 

EMP( Est, ENAME, ADDRESS, SALARY, TITLE) 

POS (E#,S#, LEVEL) 


and the corresponding view definition COMBATSTAFF: 


Ile. ENUM, e. ENAME, e. SALARY 
(op. LEVEL» 3( EMP»4POS)) 


Now when a query is issued against COMBATSTAFF: 


IIc. ENUM, c. ENAME (oc. SALARY? 30,000 (COMBATSTAFF) ) 


The view mechanism translates the query into the equivalent 


query on the base relations: 


lle. ENUM, e. ENAME 
(oe.salary?30,000\p. LEVEL>3 ( EMP»«POS) ) 


The resulting query is optimized to determine the best access 


path and then executed. 


B. FULL MATERIALIZATION 

This method creates an actual table based on the view 
definition. The resulting table is used to perform user 
queries, thus avoiding the cost of repeatedly retrieving a 
view definition and creating equivalent queries on the base 
relations. This method works quite well for processing 
queries, but is costly when the frequency of update is high, 
since the full materialized view must be maintained. 

Updates are defined as a transaction which performs a 
sequence of tuple insertions, tuple deletions, and tuple 
modifications on a relation(s). Suppose that a set of tuples 
A is added to a relation and a set of tuples D is deleted from 
the same relation. The tuple sets A and D represent the net 
change made to that relation. In that case, a tuple which is 
inserted and deleted in the same transaction would not appear 


in either tuple set A or D. 


Using this method, the net results of an update 
transaction could be used as a basis for a differential 
algorithm to update the materialized view. 

In fact this method works quite well using select-project- 
join expressions because selections and projections can be 
performed over unions. Using the view definition in the 
previous section and limiting the updates to the POS relation 


for simplicity, the view expression becomes: 


COMBATSTAFF'-COMBATSTAFF-lle . ENUM, e . ENAME, e. SALARY 
(p. LEVEL» 3 (D, »«POS) 


Ulle. ENUM, e. ENAME, e. SALARY (p. LEVEL? 3 (A, »«POS) ) 


The above expression shows that the fully materialized view 
can be maintained by computing the last two expressions and 
inserting them into or deleting them from the materialized 
view COMBATSTAFF. 

Unfortunately a similar expression can not be derived if 
a general expression is used in the view definition. At 
present no efficient differential algorithm exists for 
performing incremental updates for general expressions. This 
fact necessitates that a complete re-evaluation of the view 
expression be accomplished after each update to the base 
relations. The cost of re-evaluating a fully materialized view 
can be prohibitive as the frequency of updates for the base 
relations increase, which is the chief problem associated with 


this method. 


C. SEMI-MATERIALIZATION 

This method stores redundant subsets of carefully chosen 
data from individual base tables. These redundant subsets are 
stored as actual tables and represent an intermediate state of 
computing the view. Each subset is a projection and selection 
of the base table(s) thus making the construction of the view 
less costly than using the base relations. 

The redundant data is clustered on the join attribute(s) 
which allows for the efficient construction of the view. 
Updates to the base relations are screened to determine if the 
update affects the redundant tables. If it does, it is 
inserted into or deleted from the appropriate redundant 
tables. 

The following redundant subsets would be stored to support 
this technique: 

EMP'=Ile. ENUM, e. ENAME, e. SALARY (EMP) 
POS'-[Ip. ENUM(op. LEVEL»3 (POS) ) 


This view is combined with a user query to form an equivalent 


query on the redundant relations: 


IIc. ENUM, c. ENAME, C. SALARY ( EMP/r« POS") 


When queried the following equivalent view is created using 


the redundant tables and the view definition: 


Ile’. ENUM, e’. ENAME (2/. SALARY» 30,000 (EMP'v« POS") ) 


This method becomes more complicated as additional 
insertions and deletions occur. Since more than one base 
relation may have been the source of the tuples used in the 
materialized view, it becomes increasing difficult to 
determine, when or if a record should be removed from the 
view. 

To alleviate this problem each materialized view must keep 
a duplicate count of the number of tuples contributed, by each 
redundant subset, to the tuples in the materialized view when 
the subsets are joined. The count should be incremented or 
decremented depending on the transaction until the count 


becomes zero. 


III. DATA GENERATION PROGRAM 

The purpose of this chapter is to describe the Data 
Generation program. According to user specifications the 
program generates text files that are used subsequently to 
build the test database. As shown in Figure 1, the program 
reads control information from a text file created by a user 
or generated by the simulation program and generates the 
specified text files. The program allows the user to control 
the number of records (cardinality of the relation), the data 
type (ALPHA, NUMERIC or ALPHANUMERIC characters), the size of 


each field and the number of fields generated for each record. 


TEXT FILE 


data gen pgm 


Output file 





Figure 1: Data generation program data flow 
overview. 
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The process to generate the data is hidden from the user 
by using a fixed format control file as the user interface to 
the program. 

The program is written in ANSI C to increase portability 
of the source code and to minimize the changes necessary to 
transfer the program to a mini or mainframe environment. The 
maximum size of the text file generated by the program is 
limited only by the secondary storage available on the 


platform in use. 


A. GENERAL DESCRIPTION 

The Data Generation program receives control data from the 
text file "DATA IN". The information in the control data file 
is effectively divided into two sections. The first section 
determines the number of records, fields per record and the 
name of the output file. The second section defines each field 
within the record by type of information for the field (Field 
Type:Alpha, Numeric or Alphanumeric): the number of characters 
for each attribute (Field Width): the upper and lower bounds 
for any arrays and the incremental value used for counters. 
Data Generation program reads the data into a set of linked 
lists which are passed to the control modules by the main 


module to create each record. 
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B. REQUIREMENTS 

The requirement for the Data Generation program was based 
on a user request that a new generic data generating program 
be written in the C programming language to replace the 


previous database generating program. 


SAMPLE INPUT FILE 


300 5 EMP_DATA ASRDIT 


A. RECORD STRUCTURE B. FIELD STRUCTURE 


1. NUMBER OF 1. FIELD TYPE 4. ARRAY 
HA 


RECORDS 


?. NUMBER OF O: ALPHA- 
FIELDS NUMERIC 


3. TEXT FILE 2. FIELD WIDTH 5. INCREMENT 


NAME 
3. FIELD INFO 
S: SEQUENTIAL “Upper 
B: ARRAY BOUND 
R: RANDOM 
D: DEFAULT 





Figure 2: Data generation program parameters and parameter 
definitions. 
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The program accepts the following inputs and generates a 
text file used to create test databases: 

l. Number of text files required. 

2. Number of records per text file. 

3. Name of the text file. 

4. Number of fields per record. 


Size of each field. 


C 


OV 


Type of information in each field. 

7. Number of distinct values in each field. 
8. Upper and lower limits for the fields. 
9 


. Input reference for randomly generated characters. 


TO simplify the performance analysis several 
assumptions were made about the data generated for the test 
database. The first assumption was that the values for each 
field in the column were uniformly distributed over the range 
of values in the column. The second assumption considered each 
value in a given column to be independent of the values in the 


other columns. 


C. NOTES ON PROGRAM DESIGN 

The requirement for maximum program flexibility dictated 
a "layered" design approach be used, creating individual 
primitive modules to produce the varied types of output data 


requested by the user. 


r3 


To keep the coupling between the modules as loose as 
possible, the use of global variables is minimized and when 
feasible, only a single record structure is passed between 


the called and calling modules. 


MAIN MODULE 
PASSES CONTROL 


& RECORO 
STRUCTURE ' 


TYPE 1 
NUMERIC 
CALLS 


NUMERIC 
PRIMITIVE PRINTS 
OUTPUT 


PRIMITIVE MODULES 111 PRIMITIVE MODULES 12.1 
GENERATE AND PRINT GENERATE AND PRINT 


OUTPUT TO TEXT FILES OUTPUT TO TEXT FILES 





Figure 3: Data generation program control module 
overview. 


Each primitive module prints its output directly to the 
output text file with the exception of the 
generate numeric array module which returns its numeric output 
to the random generator module for conversion to alpha 


characters, if required. 
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This method was chosen after trial and error as the best 
method for facilitating the tracing of data and control flow 
through the modules. 

The rand() C library function was used to generate random 
data. Two C language record structures were used to establish 
the command language between the data generation program, the 


control file and the view materialization simulation program. 


D. PROGRAM MODULE OVERVIEW 
A brief description of each module is provided to clarify 
the control and data flows that are described in Section E. 
1. Main Module 
The main module opens and closes the input-output 
files, loads the control data into the record structures and 
directs the flow of the control data to the applicable modules 
for data generation. 
2. Time Hack Module 
The time hack module uses the system clock to compute 
the base reference for the generation of random alpha and 
numeric output values. 
3. Type Numeric Module 
The type numeric module is called by the main module 
to generate a numeric string, or call the sequential counter, 


random generator or the bounded sequential array modules. 
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4. Type Alpha Module 
The type alpha module is one of three process control 
modules used to determine the type of characters in a field. 
The module receives its input in the form of a record 
structure passed from the main module to generate a string of 
alpha characters, or to call the random generator or the 
bounded sequential array modules. 
5. Type Alphanumeric Module 
The type alphanumeric module is the last process 
control module and generates a single variable length string 
of alpha and numeric characters when called by the main 
module. 
6. Bounded Sequential Array Module 
The bounded sequential array module, which is called 
by either the type alpha or type numeric modules, receives 
three numeric values from the calling module. The values 
determine the array lower bound, the number of array elements 
and the incremental value of each element. The rand() function 
is used to generate a random index number to select the array 


element value that is printed in the output file. 
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7. Random Generator Module 

The random generator module is called in the same 
manner as the bounded sequential array module. The module 
determines if a character or numeric value is required, calls 
the generate numeric array module to produce the required 
value and prints the value or character in the output file. 

8. Random Long Array Module 

The random long array module is called by the 
type numeric module to produce a random numeric output 
employing the same rand( ) function that was used in the 
bounded sequential array module. The module computes the array 
size and determines if the number of array elements exceeds a 
preset limit. 

The module will compute the output value using the 
upper bound value and the rand() function to conserve main 
memory rather than allocating space for the array if the 
preset limit is exceeded. This method was used to prevent the 
program from using memory unnecessarily. 

9. Counter Module 

The counter module is called by the type numeric 

module and uses global values to generate up to three 


independent sequential counters. 
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10. Generate Numeric Array Module 
The generate numeric array module is called by the 
random generator module to produce a second independent 
bounded array similar to the bounded sequential array module 
except the random numeric output from the module is returned 
to the calling module for possible conversion to an alpha 
character. 
11. Print Modules 
The print modules are all used to send debugging data 
to a text file called "output.txt" that is controlled by a 


toggle called "TROUBLESHOOTING". 


E. DETAILED DATA AND CONTROL FLOW 

The Data Generation Program is called by a batch file 
which reads the control file "DATA IN". The input data is 
formatted to conform to the two record structures declared in 
the definition section of the program. 

Once the input data is loaded into the program, the 
control file is closed and the output file is opened. The 
output file name is part of the control file data. Each record 
structure is read and control is routed to the appropriate 
control module based on field type ( ALPHA "A", NUMERIC "N" 


and ALPHANUMERIC "OQ"). 
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The type numeric module will be used to trace the first 
data flow through the modules, the second data flow be traced 
using the type alpha module and the last data flow will use 
type alphanumeric module. 

"N" is the field type read by the main module in the 
attribute record structure. Control and the attribute record 
structure is passed to the type numeric module by the main 
module. The attribute record structure is read by the module 
to determine the field information (BOUNDED SEQUENTIAL ARRAY 
"B", RANDOM GENERATOR "R", COUNTER "S", RANDOM LONG ARRAY "Xx" 
or DEFAULT "D"). 

The field information type read by the module is "B" and 
the bounded sequential array module is called. The 
type numeric module converts the lower and upper bound 
character strings to numeric values which are passed to the 
bounded sequential array module along with the incremental 
data. The module uses the input data to determine array size, 
the lower bound and increment. 

Memory is allocated and the array is filled. The rand() 
function and the array size are used to compute an random 
index number to select the array element value to be printed 


in the applicable field in the output field. 
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Control is returned to the main module and the next 
attribute record structure is read. "A" is the next field type 
read by the main module: control and the attribute record 
structure is passed to the type alpha module. 

The type alpha module reads the attribute record 
structure. "R" is the field information read by the module. 
The random generator module is called, the lower bound 
character and upper bound character strings are read by the 
type alpha module. The character strings are converted to 
numeric values and passed along with the incremental data to 
the random generator module. 

The random generator module determines if the integer 
values represent alpha characters or numeric values. In this 
case, the values represent the upper case letters "A"(lower 
bound), "R" (upper bound), and the increment value of 1. The 
random generator computes the array size, and passes the 
values to the generate numeric array module to generate the 
array. 

The generate numeric array module allocates and fills the 
array. The rand() function is used to select an array value 
which is returned to the random generator module. The value is 
converted to an alpha character in the random generator module 


and printed in the applicable field in the output file. 
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Control is returned to the main module and the next 
attribute record is read. "O" is the next field type read by 
the main module: control and the attribute record structure is 
passed to the type alphanumeric module. 

Unlike the other control modules the type alphanumeric 
module does not call other modules. The attribute record 
structure is read to determine the total number of alpha and 
numeric characters required. Total field width is the 
aggregate of the two character strings. 

The characters are generated sequentially "A - Z" for the 
alpha string and "O - 9" for the numeric string. The 
characters are printed to the output file one at a time until 
the field is completed. 

The process for the other field information types is 
similar for both the type alpha and type numeric modules. 
ERROR handling is limited to verification of the input data 


and the opening of the required input and output files. 


F.  TESTING 

Testing was conducted on each module when it was created 
or updated. Small text files which simulated the input data 
for the particular module being tested was modified to test 
each module over a wide range of values. The entire program 
was tested using a variety of control files to create text 
files from 50 to over 60,000 records with at least 10 


attribute fields per record. 
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IV. PERFORMANCE ANALYSIS 


The purpose of this chapter is to describe and report the 
results of the empirical study conducted on the three view 
materialization strategies -- query modification, full 
materialization and semi-materialization -- using select- 
project-join (Model 1) and general expressions (Model 2). 
Performance testing was conducted on databases stored in 
Random Access Memory (RAM) and on a hard disk using a computer 
with an INTEL 80386Sx processor running at 20 MHz. The 
simulation program is written in ANSI C with embedded SQL 


commands to access the INGRES relational database system. 


A. SUMMARY OF THE RESULTS FOR THE ANALYTICAL MODEL 

Review of the results for the analytical model indicate 
that view processing strategies are most sensitive to the 
frequency of updates (P), the selectivity of the view 
predicate (fv), the selectivity of the query predicate (fq) 
and number of tuples (1) [Ref. 2]. For select-project-join 
expressions, and except for high values of P, both full and 


semi-materialization performed better than query modification. 
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Higher values of P, fv,l or lower values of fq favor semi- 
materialization over full materialization. At lower values of 
P, fv, and 1 full materialization is slightly better than 
semi-materialization as the update costs tend to be low. 

For general expressions semi-materialization performed 
better for all parameter values except for very low values of 
P. The absence of an efficient differential algorithm for 
performing incremental updates makes the use of general 


expression an unattractive alternative. 


B. EXPERIMENTAL SETUP 

The parameter definitions, parameter default values, 
access paths for the relations, query and view definitions and 
the profiles of the database relations which were used for 


the experiment are shown in Figures 4 through 8, respectively. 


Cardinaliy of the Relation. 

Number of update transactions on the base relations 
Total number of tuples modified by each update transaction 
Number of times the view is queried 


Probabilitity that a given operation is an update 
Selectivity of the view predicate (fraction of tuples In view) 
X of query predicate (fraction of tuples retreived by 





Figure 4: View materialization parameter 
definitions. 
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Figure 5: Default values for parameters. 


Relation(s) 


Clustered index on join fieid e_num 


Cluetered index on level 


Clustered index on join fleid 9.- num 


Clustered index on level 





Figure 6: Access paths for relations. 


The parameters that were considered for the sensitivity 
analysis include the following for each model tested: 

1. The fraction of updates to the total number of 
operations (P). This parameter is controlled by varying the 
number of update transactions on the base relations (kK) and 


the number of times the view is queried (q). 
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2. The selectivity of the view predicate (fv) or the 
fraction of tuples retrieved in the view with regard to the 
control relation POS. This fraction is controlled by varying 
the value v threshold (view cut) - that is, the predicates in 
the view definition. 

3. The selectivity of the query predicate (fq) or the 
fraction of tuples retrieved by the query on the view. The 
fraction (fq) is controlled by varying the value q threshold 
(query cut) - that is, the predicate in the query. 

4. The number of tuples modified by each transaction (1). 
This parameter is controlled by varying the number of tuples 
per update generated by the data generation program. 

5. The number of records in the base relation(s) 
(cardinality of the relation). 

Performance data was collected for view definitions using 
select-project-join and general expression predicates with the 
database stored in RAM and on hard disk. 

The database and the data generation program, view 
materialization simulation program and various Ingres program 
files were placed in separate sub-directories on the hard disk 
or in two similar RAM drives (4MB for database and 1.8MB for 
the other files) to determine if eliminating the hard disk 
access time (28ms average) would significantly improve the 


performance of the view processing strategies. 


Z5 


EXPRESSION 


CREATE VIEW FULL VIEW 


SELECT E NUM, ENAME, SALARY, KEYNO 


WHERE e.E_NUM * p.E NUM AND 
p.LEVEL »- VIEWCUT 


CREATE VIEW FULL_VIEW 
SELECT E_NUM, ENAME, SALARY, KEYNO 


WHERE EXISTS 
(SELECT * 
WHERE e.E. NUM = p.E NUM 


| p.o.KEYNO - p. I.KEYNO 
| AND p_i.LEVEL >= VIEWCUT ) 
| | 


QUERY SELECT E NUM, ENAME, KEYNO 
[2O50 [LLL WHERE saLAmv»-aueRvouT | 


LEGEND 





Two types of operations were conducted on the test 
database, a series of update transactions on the base 
relations which modified a varying number of tuples per update 


and queries issued against views. 


CARD (POS) = 5000 


ACCINFO 





Figure 8: Profile of database relations. 
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The average elapsed time per query for all updates and 


queries is used to compute the performance of each strategy. 


C. PERFORMANCE ANALYSIS 

This section discusses the performance of the three view 
processing strategies for view definitions which used select- 
project-join and general expression predicates. These 
strategies were applied to the test database(s) produced using 
the EMP, POS and SKILL text files generated by the data 
generation program. The reporting method will consist of 
reviewing the results for each parameter used in the 
sensitivity analysis of the two models. 

1. Model 1 : Select-Project-Join 

MODEL 1 uses the following view definition with a 

select-project-join predicate for the three view processing 


Strategies: 
IIe. ENUM, e. ENAME, e. SALARY (p. LEVEL»viewcut (EMP»«POS) ) 


a. Results for Database in RAM 
In this section, the results of the sensitivity 
analysis for Model 1 for the database in RAM is presented. 
Figures 9 through 12 show the results for model 1 for the four 
different parameter values when using the ram disk. 
In general, the trends computed for the analytical 
model were supported by the empirical results for the runs 


with the database stored in RAM. 
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The sensitivity analysis for the probability of 
update parameter shows semi-materialization performs best for 
values of P greater than 0.5 with the database in RAM. 

Full materialization was the clear winner for values of P less 
than O.5. 

This tradeoff occurs because for values greater 
than 0.5 the cost oof processing queries for full 
materialization averages .35 seconds while the cost to perform 
updates averaged .7 seconds. The cost per query for semi- 
materialization averaged .8 seconds but the cost for updates 
averaged to only .2 seconds. 

As the number of updates increased the update cost 
for semi-materialization was quartered while the cost for full 
materialization doubled. 

The average cost for query modification was 4.3 
second per transaction. Query modification exhibited the same 


trend as the analytical model. 
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— © Full Meterialization 


O 1 02 03 04 05 06 07 08 09 1 
Ratio of updates (P13) 
Figure 9: Total cost per query in seconds vs. the 


ratio of updates to the total number of operations 
P. 





For the selectivity of view parameter, the 
performance of the full and semi-materialization strategies 
were virtually identical for values of fv less than 0.3. The 
performance of  semi-materialization improved over the 
performance of full materialization as the value of fv 
increased. 

The average cost per update was .78 seconds and 
cost per query was .63 seconds for full materialization for 
values of fv less than 0.3. As expected, the cost for 
performing updates increased significantly as the value of fv 


increased. 
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Figure 10: Total cost per query in seconds vs. the 
selectivity of the view predicate fv. 


The average cost Of updates for semi- 
materialization was .76 seconds while the cost for queries 
averaged 3.26 seconds over the entire range of values for fv. 

The cost for query modification increased as the 
size of the view increased over the range of fv as expected. 
The empirical results for query modification were virtually 
identical to the analytical model trends. 

Full materialization provided the best performance 


for all values of fq for the strategies with the database in 


RAM. 
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The average cost per update for full 
materialization was .7 seconds while the cost per query 


averaged 2.2 seconds. 


1 02 03 04 05 06 07 0.8 09 
Selectivity of the query (FQ9) 





Figure 11: Total cost per query in seconds vs. the 
selectivity of the query on the view fq. 


The average cost per update for semi- 
materialization was .25 seconds but the cost per query 
averaged 3.7 seconds. Semi-materialization conformed to the 
trends indicated in the analytical results for fq. 

The average cost per transaction for query 
modification was 6.2 seconds which conformed to the 


performance noted for the analytical model. 


Su 


There was very little difference between the 
performances of semi- or full materialization over the entire 
range of values of 1 - number of tuples per update for the 
database in RAM. 

Full materialization performed slightly better for 
values of 1 less than 40. Semi-materialization performed best 
for values of l1 from 50 to 80 and greater than 90. 

The average cost per update for semi- 
materialization was .32 seconds while the average cost per 
query was .79 seconds for all values of l. The average cost 
per query for full materialization was .36 seconds and average 
cost per update was .87 seconds over the same range of 1 


values. 


10 20 30 40 60 
Number of tuples per update (U10) 





Figure 12: Total cost per query in seconds vs. the 
number of tuples modified by each update l. 
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Full materialization's performance for the l parameter far 
exceeded the expectations indicated by the analytical model. 
Query modification's performance on the analytical model 
indicated a slight improvement for higher values of 1 which 
was not supported by the empirical data. 

b. Results for Database on Hard Disk 

In this section the results of the sensitivity 
analysis for Model l on hard disk are presented for comparison 
to the results for the three strategies with the database in 
RAM. 

For the probability of update parameter, the 
transition to semi-materialization's performance exceeding 
full materialization occurs at 0.34. This indicates, as 
expected, that disk access time when added to the cost of 
performing updates has an significant impact on the 
performance of full materialization. 

Semi-materialization provides the best performance 
for values of P greater than 0.34. The figure shows a less 
steep increase for values of P greater than 0.7 than the 
results with the database in RAM. Note, however that the 
processing cost in RAM is less than 50% of the similar cost on 


the hard disk. 
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© Full Materialization 


02 03 04 05 06 07 08 09 
Ratio of updates (P3) 


Figure 13: Total cost per query in seconds vs. the 
ratio of updates to the total number of operations 
ER 





For the selectivity on the view parameter, semi- 
materialization performed best for all values of fv. The 
difference in the performance appears to be due to the 
additional cost added for accessing the disk to update the 
base relation plus the additional disk accesses required to 
update the view. 

In general the trends are identical to the trends 


exhibited by the analytical and RAM models. 
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Figure 14: Total cost per query in seconds vs. the 
selectivity of the view predicate fv. 


Semi-materialization was the best performer for 
values of fq that were less than O.4. Full materialization 
performance was better for values greater than 0.4. The 
performance for both semi- and full materialization was 
virtually identical for values of fq between 0.2 and 0.4. 

The trends for the three strategies conform to the 


results shown for the analytical model. 
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Figure 15: Total cost per query in seconds vs. the 
selectivity of the query on the view fq. 


Semi-materialization was the clear winner for all 
values of l1. Full materialization 's performance improved for 
values of 1 greater than 80 but did not out perform semi- 
materialization. Query modification's performance conformed to 
both the analytical and RAM models. The results for both 


semi- and full materialization exceeded the results shown for 


the analytical model. 
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Number of tuples per update (TUP3) 


Figure 16: Total cost per query in seconds vs. the 
number of tuples modified by each update 1. 


c. Discussion of the results for Model 1 

In general the empirical data supported the 
conclusions presented in the analytical review of the view 
materialization strategies [Ref. 4]. 

Semi-materialization's performance was superior 
for higher values of P, lower values of 1, fv and all values 
of fq for the database on the hard  disk.! Semi- 
materialization performed best with the database in RAM for 
values of P greater than 0.5, fv greater than 0.3 and for l1 


values between 50 to 80 and greater than 90. 


!  Semi-materialization was  outperformed by full 
materialization for only the first value of fv while using the 
RAM disk drive. 
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This was due to the low cost per update for semi- 
materialization when compared to the other strategies. The 
cost advantages of performing queries and updates on the 
redundant subsets is due primarily to the fact that any 
transactions performed using semi-materialization are on 
smaller table(s) than the base relations. 

Full materialization performed best for lower 
values of P, l1 and for all values of fq for the database on 
RAM. As expected full materialization performed best when the 
primary transaction was a query. Surprisingly, full 
materialization overall performance on RAM was quite good even 
for the parameters for which it was not the best performer. 
For example, in Figure 12, full materialization's performance 
was nearly identical to semi-materialization over the entire 
range of values for l. Similarly, full materialization's 
performance was not significantly worst than semi- 
materialization for values of fv as shown in Figure 10. Full 
materialization performed best with the database on hard disk 
for P less than 0.34 and for fq values greater than 0.4. 

Query modification outperformed full 
materialization for values of P greater than 0.82 for RAM and 
0.84 on the hard disk. This was due to the very high cost per 


update for full materialization as discussed previously. 
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2. Model 2 : General Expressions 
MODEL 2 uses the following general expression view 


definition expressed in relational calculus: 


. ENUM, e. ENAME, e. SALARYwherez3 
(e. ENUM-p. ENUMAD. LEVEL» viewcut(EMP»«POS) ) 


a. Results for Database in RAM 

The results for the database in RAM are displayed 
in Figures 17 through 20. The trends exhibited here are 
noteworthy since the performances for all the strategies 
exceed the results obtained from the earlier experimental data 
but tended to conform to the analytical model [Ref. 4]. 

The results for the probability of updates 
parameter are displayed in Figure 17 and indicate semi- 
materialization outperformed query modification for all values 
of P. It also outperformed full materialization for values of 


P greater than 0.1. 
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Figure 17: Total cost per query in seconds vs. the 
ratio of updates to the total number of operations 
P. 


Semi-materialization's average cost per update was 
0.22 seconds while its cost per query averaged 5.73 seconds 
over the entire range of P values. Full materialization 
averaged a cost per query of 0.31 seconds but its advantage 
was offset with an initial update cost of 62.89 seconds. Query 
modification's cost per transaction was 33.65 seconds. 

Semi-materialization was the best performer for 
all values of fv, which coincided with the trend for the 
analytical model. The simulation results for query 
modification and full materialization were better than the 


results for analytical model for all values of fv. 
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Semi-materialization's cost per update averaged 
0.68 seconds and its average cost per query was 21.3 seconds 
over the entire range of fv values. Full materialization 
averaged 1.5 seconds per query and its cost per update 
averaged 270 seconds. Query  modification's cost per 


transaction averaged to 51.7 seconds for fv values. 


Legend 
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Figure 18: Total cost per query in seconds vs. the 
selectivity of the view predicate fv. 


The analysis for the selectivity of the query 
parameter shows that semi-materialization was the most cost 
effective strategy for processing queries for general 


expressions. 
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The analytical and empirical results for query 
modification were nearly identical for the entire range of fq 
values. Full materialization proved to be the worst performer 
of the strategies, as displayed in Figure 19. 

Semi-materialization cost per update for fq 
averaged to 0.23 seconds while its average cost per query was 
20.7 seconds. The average cost per update for full 
materialization was 63.5 seconds and its cost per query was 
1.3 seconds. Query modification's cost per transaction was 


46.1 seconds. 


Legend 
X Query Modification 
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Figure 19: Total cost per query in seconds vs. the 
selectivity of the query on the view fq. 
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Figure 20 shows that semi-materialization 
outperformed full materialization and query modification for 


all values of 1. 


10 20 30 40 50 60 70 80 90 1 
Number of tuples per update (U8) 





Figure 20: Total cost per query in seconds vs. the 
number of tuples modified by each update l. 


Semi-materialization's average cost per update was 
0.28 seconds and its cost per query was 5.7 seconds for all 
values of l. Full materialization's cost averaged to 0.32 
seconds per query and 63.2 seconds per update. Query 


modification's transaction cost averaged 33.7 seconds. 
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The slight increase in performance cost over the 
entire range of l anticipated by the analytical model for 
semi-materialization as the number of tuples per update 
increased was not supported by the simulation results. 

Query modification and full materialization conformed to the 
results plotted for the analytical model [Ref. 4]. 
b. Results for Database on Hard Disk 

In this section the results of the sensitivity 
analysis for Model 2 on hard disk are presented. Figures 21 
through 28 show the results for model 2 for the five different 
parameter values when using the hard disk. 

Unlike the experiment conducted in RAM for the 
three view processing strategies, the cardinality of the POS 
relation will be varied from 7500 to 10,000. The methodology 
used to conducting the sensitivity analysis for the four other 
parameter values (P, fv, fq and 1) was exactly the same as the 


methodology used for the experiment conducted in RAM. 
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The results for the probability of updates 
parameter is displayed in Figures 21 and 22 shows that semi- 
materialization out performed query modification over the 
entire range of P values. Its performance was better than full 


materialization for values of P greater than 0.1. 


Legend 
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Figure 21: Total cost per query in seconds vs. the 
ratio of updates to the total number of operations 
P for 7500 records. 
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Figure 22: Total cost per query in seconds vs. the 
ratio of updates to the total number of operations 
P for 10,000 records. 
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The reason for semi-materialization performance is 
its low cost per update which offset its average cost per 
query. For an cardinality of 7500 records, semi- 
materialization's average cost per update was 1.55 seconds and 
19.82 seconds per query. Its average per update for an 10,000 
record cardinality was 3.41 seconds with a cost per query of 
21.21 seconds. 

Full materialization performed best for a P value 
of 0.1 or less for both cardinality values but the extremely 
high cost of its first update (250 seconds for 7500: 417 
seconds for 10,000) quickly overcame its cost advantage for 
processing queries. 

Query modification outperformed full 
materialization for P greater than 0.2 because of full 
materialization high cost per update. 

Semi-materialization outperformed both query 
modification and full materialization over the entire range of 
fv values. The cost per update for semi-materialization with 
a cardinality of 7500 was 3.7 seconds and its cost per query 
was 51.2 seconds for all values of fv. Semi-materialization's 
cost per update for a cardinality of 10,000 records was 7.1 
seconds while its cost per query was 59.0 seconds. Query 
modification cost per transaction averaged 102 seconds for 


7500 records and 180 seconds for 10,000 records. 
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Full materialization cost per update increased at a rate of 
150$ for 7500 records and doubled for 10,000 records as the 


size of the view increased. 


y of the view (FV19) 
Figure 23: Total cost per query in seconds vs. the 
selectivity of the view predicate fv on 7500 
records. 








Figure 24: Total cost per query in seconds vs. the 
selectivity of the view predicate fv for 10,000. 
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As shown in Figures 25 and 26; semi- 
materialization outperformed query modification and full 
materialization for all values of fq. 

The average cost per update for semi- 
materialization with a cardinality of 7500 was 1.5 seconds and 
its cost per query was 64.5 seconds. 

Semi-materialization's averaged costs for a 
cardinality of 10,000 was 2.8 seconds for updates and 73.0 


seconds for queries. 
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Figure 25: Total cost per query in seconds vs. the 
selectivity of the query in seconds on the view fq 
for 7500 records. 
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Figure 26: Total cost per query in seconds vs. the 
selectivity of the query on the view fq for 10,000 
records. 


The averaged costs for full materialization for 
7500 records was 2.5 seconds per query and 256 seconds per 
update.Query modification's costs averaged 116.0 seconds for 


7500 records and 150.6 for 10,000 records. 
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As expected semi-materialization was the clear 
winner over both query modification and full materialization 


over the entire range of 1 values. 





Number of tuples per update (U12) 


Figure 27: Total cost per query in seconds vs. the 
number of tuples modified by each update Z for 7500 
records. 


Legend 


C] Sem! Msteralizetic 
O Full Materialization 
10 20 30 deb 00 80 90 1 
Number of tupies per update (U9) 
Figure 28: Total cost per query in seconds vs. the 
number of tuples modified by each update 1 for 10, 
000 records. 
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Semi-materialization's performance cost averaged 
2.1 seconds per update and 18.8 seconds per query with a 
cardinality of 7500 for all values of l. 

Similarly its averaged costs for 10,000 records 
were 3.7 seconds per update and 20.2 seconds per query. Full 
materialization's costs averaged 0.71 per query and 253 
seconds per update with a cardinality of 7500. Its average 
costs were 0.72 seconds per query and 421 seconds per update 
for 10,000 records. Query modification had an average cost of 
68.7 seconds per transaction for 7500 records and 103.2 
seconds for 10,000 records. 

The additional cost associated with using a hard 
disk drive and increasing the cardinality had an impact on the 
performance experienced for both query modification and full 
materialization. 

The cost for processing queries using query 
modification on the hard disk increased three fold over the 
cost of processing the query in RAM. 

The cost of processing updates using full 
materialization increased by 700% over the same cost for 


updates in RAM but the cost for query processing only doubled. 
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The costs for semi-materialization also increased 
by a factor of ten for updates and quadrupled for query 
processing. The increase in processing costs for semi- 
materialization was offset by the use of the redundant subsets 
Of the base relation which allowed for a more efficient 
construction of the materialized view. 

c. Discussion of Results for Model 2 

Semi-materialization performed better with the 
database in RAM and on hard disk for the entire range of 
values of cardinality than both query modification and full 
materialization for all parameters except for a value of P 
less than O.1. Full materialization performed better for P 
less than 0.1 because its average cost per query was only .42 
seconds while  semi-materialization's cost averaged 15.4 
seconds. 

For values of P greater than O.1 the cost for 
performing a single update for full materialization rose to 
62.89 seconds for 5000 records in RAM, 250 seconds for 7500 
records on hard disk,and 417 seconds for 10,000 records on 
hard disk which offset any advantage offered by full 


materialization superior performance for query processing. 
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This dramatic cost increase for update processing for full 
materialization using general expression predicates is due the 
lack of an efficient differential update algorithm. This 
necessities the complete re-evaluation of the view definition 
for any update transaction. 

Query modification outperformed full 
materialization for all parameters except for values of P less 
than 0.35 for all values of cardinality. As indicated above 
the cost of a single update transaction for full 
materialization quickly drives its cost higher than an other 
strategies. 

In this case as the number of updates increased 
the aggregate cost for query modification dropped since the 
cost of updating base relations to support this strategy are 
not timed. The cost of processing four or less updates 
(average cost per update for query modification is 0) combined 
aggregate cost for processing six queries (average cost per 
query is 33.6 seconds) is more than the cost for processing 
the same number of transactions for full materialization. 

Full materialization's superior performance for 
lower values of P is evident and is based its low cost for 
processing queries on the view. This advantage was quickly 
overwhelmed by the overhead of maintaining the fully 


materialized view [Ref. 4]. 
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Semi-materialization performed best on hard disk 
for all parameters expect for a value of P of 0.1 or less. 
Full materialization was the best performer for that value of 
P because the only transaction performed for those values was 
query processing. 

As indicated above, semi-materialization is the 
best strategy for processing view definitions using general 
expressions for predicates. 

It is interesting to note that the results shown 
in Figures 17 through 28 show that, in general, the 
performance trends for the view processing strategies are the 
same for P, fv, fq, and l1 for the entire range of values for 


cardinality of POS. 
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V. CONCLUSIONS AND RECOMMENDATIONS 
The purpose of this chapter is to state conclusions based 
on the research and make recommendations for improvements and 


further study on the three view materialization strategies. 


A. CONCLUSIONS 

The empirical data of this thesis confirms that the semi- 
materialization strategy is best method for processing views 
with predicates using general expressions. 

The performance of semi-materialization with the database 
in RAM exceeded the trends forecasted for general expressions 
for the analytical model or actual results achieved on the 
earlier experimental study [Ref. ?]. This is reasonable 
because of the cost penalty paid by the strategy when it 
becomes necessary to access a hard disk to perform updates and 
queries. 

The trends for the simulations for semi-materialization 
with its database stored in RAM indicate it may be suitable 
for near real time view processing using general expressions 

based on its relatively low average costs for updates and 


queries. 
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Full materialization performed well for lower values of P 
due to its low average cost per query while query 
modification's performance was good over all parameter values 
but both strategies are less efficient than semi- 
materialization for general expressions. 

Select-project-join view definitions with the database in 
RAM proved to be the most cost effective method for view 
processing (see Figures 9-12). 

Overall performance for all parameter simulations using 
this view definition and the ram disk drive were three to five 
times faster than similar runs using a hard disk drive. These 
savings are significant when considering view processing for 


the small databases inherent to tactical environments. 


B. RECOMMENDATIONS AND FUTURE RESEARCH 

We recommend that the same simulations for select-project- 
join and general expressions be conducted with an 80486 
processor with a minimum of 16 MB of RAM to test databases 
with up to 20K records. The internal 8k cache and math co- 
processor should significantly reduce the processing times for 
all three strategies using either view definition. 

We predict that this approach could improve the 
performance of the semi-materialization strategy well enough 


to make it feasible for use with real time tactical systems. 
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For example, the electronic order of battle maintained on 
board a tactical aircraft could be completely updated during 
or enroute to an engagement using information received by its 
own Sensors or sensor information passed from other sources. 

This strategy could be used in conjunction with the Joint 
Ocean Tactical Surveillance (JOTS) system to provide a real 
time computer generated picture of the tactical and strategic 
environments. 

Used in this manner, JOTS could be placed on board classes 
of ships which do not have the Naval Tactical Data System 
(NTDS) installed on board at a tremendous savings over back 
fitting the vessels with NTDS. The information would improve 
the vessel's mission performance by keeping the Commanding 
Officer constantly updated with real time battle group 
position data and allow for the information received by that 
vessel's sensors to be incorporated into the tactical picture. 

We also recommend conducting more simulations on actual 
databases with more than two relations, and updates applied to 
several relations. 

Finally, further work is needed to investigate the 
performance of view processing strategies in the presence of 


overlapping views over the same relation. 
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APPENDIX A. DATA GENERATION PROGRAM 
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/*Author: Curtis Barefield */ 


Title: Data Generation Program */ 
/*version: MS C 6.0 /C++ (T2) m 
/*created: 17 June 91 */ 
/*updated: 11 Aug 91 */ 


f/ HR H0 HH RR RH o Fe e e Fe ee HN FN HR Fe e eH Fe HH e e HU Y Fe HH UA 


* 


* 


This program was written to replace the previous hardwired test database * 


generating program with a more generic program. This program generates the text records* 


used to create the database used to test the view materialization * 
strategies purposed by Professor Magdi Kamel. The associated test t 
pg has been written by Lt. Jesse South, USN, a CSM student in class PL03. È 


Toe fede dede ee dee fee dde defe df dH HH AFF d He RU HH FN URNA) 


include «stddef.h» 


include <stdio.h> 


include <stdlib.h> 


include «time.h» 


#include <ctype.h> 


#include «string.h» 


#define size 16 /* sets buffer size for output file name */ 


#define ALPHA 1 /* set buffer for type/info values */ 


#define BOUND 6 /* set buffer for upper/lower bounds*/ 


/*DEBUG TOGGLE */ 


#define TROUBLESHOOTING 0 


/* 1 sends debug data to output.txt file */ 


/* prints data used for debugging pg */ 


void print random generator array(int, int*), print bounded array(int, long*): 


/* used to read data into structs */ 


int rand(), count, incrementl, increment2, increment3, t; 


long in, resultsl, results2, results3: 


/* modules used to generate random values */ 


unsigned timer, time hack(); 


void srand(unsigned int): 


FILE *input file, *output file; /* file pointers for text files */ 
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JRERARARARARRARARARARAARARARARARAARASTRUCTS OA ARRARARA RARA AARARERAARRARARARARARERAARRA RARA RA RRARARARARARRARA / 


struct field attributes 
( 
char field type[ALPHA]; 
int field width: 
char field information[ALPHA]; 
char lower bound{BOUND]: 
long increment; 
char upper bound[BOUND]: 
struct field attributes*next; 
)JATTRIBUTE: 
struct spec type 
( 
long number of records; 
int number of fields; 


char file name[size]: 


struct field attributes *flret field: 
struct spec type tnext; 
)SPECIFICATION; 


/* declares modules used to generate attributes */ 
void type alpha( struct field attributes *); 

void type numeric( struct field attributes *):; 

void bounded sequential array( long, long. int); 

void type alphanumeric( struct field attributes *); 
void counter( int? vint): 

void random generator(int, int, int): 

void print database specifications(struct spec type Sy 
void random long array(long, long, long): 


struct spec type *list - NULL: 
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/ERARARAREAAEREREERETRARERE TERRE REREMAI|NAAAEERERERERERETAERERERERETIARERERERERAR/ 


void 


( 


main() 


char file _name(size], field type(ALPHA], field information[ALPHA]: 
char c lower bound[BOUND]. c upper bound[BOUND]: 

long number of records; 

int field width, number of fields, i, increment; 


/* creates and defines ptrs used for linked lists */ 


struct field attributes *new attributes NULL: 
struct field attributes *next field - NULL; 
struct field attributes *end = NULL; 

struct spec_type *new_spec= NULL: 
struct spec_type *top spec = NULL: 
struct spec type *next spec- NULL; 


list s NULL; 


input files fopen("data in","r"): 
if((input file)--NULL)( 
printf("CAN NOT OPEN INPUT FILEMn"): 
) /*opn input file*/ 
/* loads structs and creates linked lists */ 
while(:(feof(input file))) 
( 
/* allocates memory for each specification struct */ 
new spec * (struct spec type *)malloc(sizeof(struct spec type)): 
if(top spec == NULL) /* sets ptr to top of linked list */ 
( 
top spec = new spec; 
next spec = new spec; 


list = new_spec; 


else 


next spec-»next s new spec;/* sets ptr to next spec */ 


next spec = next spec->)next; 
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/* read in data and loads structs */ 
if(1 !- (fscanf(input file," Xld".&number of records)))í( 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."): 
exit(0): 
) 
next_spec->»number_of records=number_of records; 
1f(1 !- (fscanf(input file,"t1d",&number of fields)))( 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."); 
exit(0); 
) 
next_spec->number of fields=number of fields; 
1f(1 != (fscanf(input file,."%s".file name))){ 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."); 
exit(0): 
) 
strcpy(next spec-»file name,file name): 
next spec-ofirst field = NULL: 
next_spec->next = NULL; 
/* create linked list for attributes */ 
for(i=0; i<next_spec->number of fields; ++1) 
{ 
/* allocates memory for attribute struct */ 
new attribute -(struct field attributes *) malloc (sizeof(struct field attributes)); 
if(next_spec->first field == NULL) 
( 
next spec-»first field = new attribute; /* sets ptr to top of list */ 


end - next spec-»first field: 


else 


end-»next = new attribute; /* sets ptr to next attribute */ 
end = end->next; 
) 
/* load attribute struct */ 
if(1 !'» (fscanf(input file,"Xs",field type)))( 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."): 
exit(0): 


) 
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strcpy(end-»field type.field type): 

1f(1 :* (fscanf(input file,"Xd",&field width)))( 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."): 
exit(0): 
) 

end->field width=field width: 

1f(1 := (fscanf(input_file,"%s", field information)))( 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."): 
exit(0): 
) 

strcpy(end-»field information,field information); 

if(1 !- (fscanf(input file,"Xs",c lower bound)))( 
printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."); 
exit(0):; 
} 

strcpy(end-»lower bound.c lower bound): 

1f(l :» (fscanf(input file,"Xd",&increment)))( 


printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."): 


exit(0); 
) 
end->increment = increment; 
if(l != (fscanf(input file,"€$s",c upper bound)))( 


printf(" UNEXPECTED VALUE. PROGRAM TERMINATED."): 
exit(0): 
) 

strcpy(end-»upper bound.c upper bound): 


end -»next =NULL; 


fclose(input file); 
/* Redirects monitor output to text file called output.txt */ 
#1f TROUBLESHOOTING 

freopen("output.txt”,"w", stdout); 

print database specifications(list): 

t=0; 

#define DEEP_DEBUG l 


#endif 


63 


/* uses system time to generate random number */ 
timer=time_hack(): 
srand(timer) ; 
/* uses linked list to read each spec (BEGIN PG EXEC ) */ 
while(list != NULL) 
{ 
resultsl=0; /* sets module "counter" to zero */ 
results2-0; 
results3=0; 
incrementl=0: 
increment2=0; 
increment3=0; 
output file= fopen(list->file name, "w"); 
if((output file)--NULL)( 
printf{"CAN NOT OPEN OUTPUT FILE\n"): 
exit(0): 
) /*opn output file*/ 
/* BUILD DATABASE TEXT FILES */ 
for(insO;in«list-»number of records;**in) 
{ 
next field = list->first field; 
count=0; 
while(next field !=NULL) 
{ 
if(next field -» field type[O]--'A') 
( 
type alpha( next field): 
) 
else if{next field -> field typel(0]=="'N') 
{ 
type numericí( next field); 
) 
else if(next field -> field type[0]=='0') 


{ 


type_alphanumeric{next_ field); 
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next field - next field-»next:; 
if(next field !=NULL) ( 
forintf(output f1lB-' ^"): 
) 
) 
tprintfloutput- file. An )- 
) 
list = list->next; 
fclose(output file): 
) 
exit(0): 


) 


[ AAAAARARAARAARARARERAEARAARARRAARAERARRRAEARAERRARAERRARARAAAEARARAEARAERRERARE EE 


È Time Hack uses the system clock to provide the seed value E 


s for the rand and srand library functions. 2 


ERRARORAURARARAERARRERRARARARARARARARARARARARARRRARRARARRRARARRARRARARAAA Y 
unsigned time hack() 
( 
clock t rand input; 
unsigned seed input; 
rand inputsclock(): 
seed _input=(unsigned)rand input; 
return(seed input); 


} 


[ARRRRARARARARARARERARAARARARARARARARARAARARARARARARAARARRARRARARRARARRARAR 


z Bounded sequential array x 

= creates a bounded array which is used with the rand() library * 
* function to select from a user specified number of array n 
* elements to return a value from within that array. * 


ARARAARARAARARRARERRARRERAERARRRERERRERRRARRARRARRAARARARERERARRARRAERRRR RH / 
void bounded sequential array(long íncrement, long lower bound, ínt number of values ) 
( 

int m, long index; 

long *long storage,long_ low; 

/* allocate storage space in memory for array */ 


long storage-(long*)calloc(number of values,sizeof(long)): 
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) 


J* £1) lvarrFAy 2 
long storage[0]-1ong lowslong lower bound: 
for(m=1; m< number of values: m++) 
{ 
long_storage(m]=long_low +=long_increment; 
) 
/* select value fm array using generic random generator */ 
long index-(rand()X€number of values); 
fprintf(output file,"XO5ld",long storage[long index]): 
dif DEEP DEBUG 
IfCotelr 
print bounded array(number of values, long storage):] 
t++; 


#endif 


[ERRRARARRARRARARARAARARARARRARARARARARARARARARARARARARARARARAARARARARRRRA AR 


* 


Type alpha generates alpha character string to fill user * 


determined field size * 


RRÁRAERARARRARAAARARRAARRARARARRARARRARARARARARARARARRAARARARARARARARAARRAARARARAAR RARA Y 


void type alpha( struct field attributes *next field) 


( 


char c; 

static char start='A'-1; 

int j. increment, lower bound, upper bound: 

int number of values; 

long increment, long lower bound: 

/*****9$* Conversions for Random Generator **$***4*4*9449**/ 
lower bound-(int)(next field -»lower bound[0]): 

upper bounds(int)(next field -»upper bound[0]): 
incrementznext field -»increment; 
¡SeAARARARAAAA Conversions for Bounded Array ********/ 
long_lower_bound=(atol) (next_field -»lower bound): 
long incrementsnext field -»increment; 
number _ of values=(atoi)(next field ->upper bound); 
if(next field -»field information([O]ss'R') /* Random field selected */ 


( 


random generator(increment, lower bound. upper bound); 
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else if(next field ->field information[0]=='B') /* Bounded sequential field 


selected */ 


bounded sequential array(long increment, long lower bound, number of values): 
) 
else if(next field -»field information[0]ss'X';; 'D') 


/* countup counter places chars 'A'-'Z' in field */ 


for(j=0:j<next_ field ->field width;++j) /* sets field width */ 
{ 
if(start »s'2') 


( 


starts 'A': 


else 


++start; 
) 
fprintf(output file."%c", start); /* prints alpha char to 


output file */ 


) 


f R8 FUROR ROUEN EE 


s Type numeic generates numeric chararters to fill user * 


" determined field size 
RARARERAAARARAAARAARARARAAARARAARAARARARAAARRAARAARARAARARAR RARA RARA AAA Y 
void type numeric( struct field attributes *next field) 
( 

char c; 

static start='0'-1; 

int j, lower bound, increment, upper bound: 

int number of values, increment c, lower bound c: 

long increment, long lower bound. increment long array; 


long lower bound long array, upper bound long array: 
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[ERRARARARARARR Conversions for Counter RARRAERRERRERRAA/ 
increment c = next field-»increment; 
lower bound c -(atoi)(next field-»1lower bound); 
/******* Conversions for Random Generator ****$44*****/ 
lower _bound=(int)(next field ->lower bound[0]): 
upper bound-(int)(next field -»upper bound[0]): 
increment-next field -»increment: 
/************* Conversions for Bounded Array ***+*.*..../ 
long lower bounds(atol)(next field -»lower bound); 
long increment-znext field -»increment; 
number of valuess(atoi)(next field -»upper bound); 
[/************* Conversions for Random Long Array ********, 
lower bound long array -(atol)(next field -»lower bound): 
increment long array -(long)(next field -»increment): 
upper bound long array =(atol)(next field ->upper bound); 
if(next field -»field information[0]s*'S') /* sequential counter */ 
( 
counter(increment c, lower bound c): 
count++; 
) 
else if(next_ field ->)field information[0]=="R') /* Random */ 
{ 
random generator( increment, lower bound, upper bound); 
) 
else if(next field -»field information[O]s-'B') /* Bounded sequential */ 
( 
bounded sequential array(long increment, long lower bound, number of values ): 
) 
else if(next field -»field information[O]s2'X') /* Random long array */ 
( 
random long array(increment long array. lower bound long array. upper bound long array le 
) 


else if(next field ->)field information[0]=='D') 
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/* countup counter, fills field with '0' - '9' in sequence */ 


for(j=0:j<next field ->field width:++j) /* sets field width */ 
{ 
if(start 5»s'9") 


( 


start= ‘0’: 


else 


++start; 
} 
fprintf(output file,"Xc",start): /* prints numeric char to 


output file */ 


) 


[AARAARRRARRRRRRARRERRRRRRARRRARRRERERRRRERRRRRERERRRRERERERARERERRRRERER 


e Type alphanumeric generates alphanumeric chararters to fill * 
* user determined field size * 
RAEARRAEEEAARAARARARAEAARAAARARARARARARAAAAARARARARARAARARARARARARARARARARARARRARARARAAR AY 
void type alphanumeric(struct field attributes *next field) 
( 

cher c, d: 

int Jj. k /*, count=0*/ : 

static char alphaz='A'-1: 

static char numeric='0'-1: 

int alpha field width, numeric field width; 

alpha field widths(atoi)(next field -»1ower bound); 


numeric field widths(atoi)(next field -»upper bound): 
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for(j=0;j<alpha_field width:++j) /* generates 


{ 


) 


if(alpha >= 2") 


{ 


alpha='A'; 
) 
else 
{ 
++alpha: 


) 


fprintf(output file,"*c", alpha): 


for(k=0:k«numeric field width; ++K) /* numerícs 


( 


ifínumeric >='9') 


( 


numerics'O':; 


else 


++numeric; 


) 


fprintf(output file,"Xc",numeric): 
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alpha chars 


nO zug 


So 


'A'-'7’ 


A 


f ERRARE RUN NAAU RR RNURGN URN NON NGA IN ANUUAUNANUNON IUN NERREEUERERURUA 


a Random generator takes a lower, upper and È 
È increment int value, creates an numeric array and * 
» uses the functions rand() and srand() to * 

* select an array element which maybe converted to a È 
È alphanumeric char for printing to the output file. È 


To yd de df d Hd d dde d de dd d d f d d d de d d d IEEE iT iii ERA / 


void random generator( int increment, int lower bound, int upper bound) 
( 
/* Declare module data elements */ 
int numeric index; 
int output from array; 
int generate numeric array(int,int,int): 
char alnum character output; 
/* Compute array size */ 
numeric index- ((upper bound-lower bound)/increment): 
/* Determine if a set of bounded random numbers are required */ 
if(((lower bound»47)&&(upper bound«58)) |; (lower bound»64)&&(upper bound«91)) 
( 
output from array- generate numeric array(numeric index, increment, lower bound): 
alnum character output-(char)(output from array): 


fprintf(output file, "%04c",alnum character output); 


else 


output from array= generate numeric array(numeric_index, increment, lower bound): 


fprintf(output file, "XO4d", output from array): 
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[ RRRRRRRRRRRERRRRRRARRRRERRRRRRRERERRRERRERRRRRARRRERARRERRRRERRRRERRRERR 


* Random long array takes a lower, upper and È 
x increment long value, creates an numeric array and * 
2 uses the functions rand() and srand() to * 

e select an array element which is printed to È 
= the output file. z 


RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRERRRRRRRRRRRERRRRRRRRRRRRRR / 


void random long array( long increment long array, long lower bound long array. long 
upper bound long array) 
( 
/* Declare module data elements */ 
int m, index: 
long numeric index, low: 
long f* output from array; 
long output from random; 
/* Compute array size */ 
numeric index- ((upper bound long array-lower bound long array)/increment long array): 
if (numeric_index<20) 
{ 
/* Allocate memory block for long array*/ 
output from array= (long*)calloc(numeric index,sizeof(long)): 
/* Set lower bound of array */ 
output from array[0]= low= lower bound long array; 
/* Load array */ 
for(m=1;m<cnumeric_index;m++) 
( 
output from array[m]= low+= increment_ long array: 
} 
indexs (rand()Xnumeric index): 


fprintf(output file,"XO41d",output from array[index]): 


else 


output from random= (1+(rand()%upper bound long array)): 


fprintf(output file,"%04ld",output from random); 
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[SREREEARRARRARRARAEAARARAAAARAARARARARAAERARAAARAERAAAARARARARARAARARAARARAARAAAA AA 


* Counter uses lower bound and increment to act as z 
= a sequential counter for max of numbers per spec type. x 
AERAERRARAARARARAEAAAAAAARARARARARARARRARARAARAERARARARAARARAARARARARARRARRARARARAAA Y 
void counter( int increment c, int lower bound c) 
( 
{tA ne eee FIRST COUNTER RERAERARAARA Y 
if ((in==0)&&(results1==0) &&(count==0) ) 
{ 
resultsl - lower bound c: 
incrementl - increment c; 
) 
else if(count==0) 
( 
resultsl = resultsl+incrementl: 
) 
if (count==0) { 
fprintf(output file,"%04ld",resultsl):} 
/*********** SECOND COUNTER ******sssuus / 
if((in==0)&&(results2==0)&&(count==1)) 
{ 
results2 = lower bound c: 
increment2 = increment_c:; 
} 
else if(count==1) 
( 
results2 = results2+increment2; 
} 
if(countsz1)( 
fprintf(output file,"XO41d",results2):) 
fJ*f******$$*** THIRD COUNTER *****99444**/ 
if ((in=20) &&(results3==0) &&(count==2) ) 
{ 
results3 = lower bound c: 


increment3 s increment c; 
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else if(count==2) 


results3 = results3+increment3: 
} 
if(count--2)( 
fprintf(output file,"xO41d",results3):) 


f ARR o d d d t / 


) 


JTRRERRERRACACAR ERA AAA LAA AAA ACA AAA ETERNA NN T a A 


* Generate numeric array produces a bounded = 
* array and uses the rand() function to simulate a real = 
E random number generator * 


ERERARARARARARARAARARARRAARARRARARARA RARA AARARARARERARAAARARARARARAARA RANA / 


int generate numeric array(int numeric_index, int íncrement, 


int lower bound) 


int m. index; 
int *numeric storage, low; 
numeric storage-(int*)calloc(numeric index.sizeof(int)); 
numeric_storage[0]= low = lower bound: 
for(m=l; m<numeric index; m++) 
{ 
numeric_storage[m]=low += increment; 
} 
index=(rand()%numeric index); 
return(numeric storage[index]): 
#if DEEP DEBUG 
if(e< 
Print random generator array(numeric index, numeric storage}; } 
cre 


#endif 
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[RARRRRRRRRERE PRINT RANDOM GENERATOR ARRAY RARARERARARARARAARARRARARAAAARRA A / 
void print random generator array(int numeric index.int *numeric storage) 
( 

Int di: 

for(i=0;i<numeric_index;1i+*){ 


printf(" array[%d] = %d\n",i,numeric storagel[i]); 


) 


f****ht tette tete ttt PRINT BOUNDED ARRAY RARARARARARARARAERARARARAARAERARARARARRRR 
void print bounded array(int number of values,long *long storage) 
( 

int i: 

for(i=0:i<number of values:i++){ 


printf(" array[%d] = xO51dWMn",i,long storage[i]): 


) 


fa*9 d dd dd dd d dtd 


* Prints specifications including all attribute link lists È 


RRERARARARAARARARARAAARAARARAARARARAAAAARRARARARRARARARRARRARAARAARARAMAAAA Y 
void print database specifications(struct spec type *list) 
{ 
struct field attributes *next field: 
Into: LOs 
printf("Printing link lists for generic database generator\n"): 
while(list !=NULL) 


{ 


printf("\tSPEC €dWMn",*««1); 
printf("\tThere will be %ld records\n",.list->)number of records); 
printf("\Tthere will be %d fields\n".list->)number of fields); 
printf("\tThe file name is %sin",list->file name): 
next field *» list-»first field: 
while(next field ! zNULL) 
( 
printf("\tField %d:\n".++1); 
printf("\t%s is field type\n".next field->)field type): 


printf("\t%d is field width\n".next field->field width); 
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printf("\t%s is field info\n".next field->field information); 
printf("\t%s is lower bound\n".next_ field->)lower bound): 
printf(“\t%ld is increment\n".nèxt field->)increment); 
printf("\t%s is upper bound\n",next field->upper_ bound): 
next field = next field-»next: 

) 


list = list->next: 
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APPENDIX B. VIEW MATERIALIZATION SIMULATION PROGRAM 
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/* Title : View Materialization Simulation (vsgxpdp7) 


/* Author : Jesse T. South 

/* Date s: T TunéitJ91 

/* Revised 2 25 July 12991 

/* Modified : for general expressions 22 AUG by Curtis Barefield 
/* Purpose : Thesis Research 

/* System : IBM 80286 clone/ 80386SX 

/* Compiler : Microsoft C 6.0, INGRES precompiler, (Borland C++) 
/* Description : The program is part of a thesis 


#include <stdio.h> 

#include <stdlib.h> 

#include <time.h> 

Kinclude «math.h» 

exec sql include sqlca: 

#define size 16 

fdefine dbinfo "info.dat” 

define cntrlfl “cntrl.dat" 

#define update file "data in" 

#define finrslt "fnlrslt.dat" 

#define runrslt "rnrslt.dat” 

exec sql begin declare section: 

#define empinfo "empdat.dat" 

#define posinfo "posdat.dat" 

#define skilinfo “"skildat.dat" 

#define updatinfo "update.dat" 

exec sql end declare section: 

void open files(FILE**, FILE**, FILE**); 
void close file$lFILE**. FILE**;-PILE**); 


void init test database(int): 


m 
rA 
ui 
E 
a7 
ae 
ay 


Jr 


void scan dbinfo(long*, longf*, long*, int*, int*, int*, long*, long*, long*):; 


void create tables(void): 

void create views(int): 

void create update table(void): 

void copy base tables(void): 

void copy semi n full mats(int); 

void create table index(void): 

void module qm(char, int, long, double*, FILE*); 


void module sm(char, int, long. double*, FILE*); 
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void module fm(char, int, long, double*, FILE*); 
void write file headings(char*, char*, FILE*, FILE*); 
void write run result(char, char, int. long. double, long, FILE*); 
void write final result(int, int, long, int, long. long. long. float, 
float, float, float, float, double, double, double, FILE*, FILE*); 
void compute avg time(int, double*, double*, double*); 
void compute fv and fq and P(int, int, int, int, float*, long, long. long, 
long, flo&t*, int, int, float*): 
void compute table counts(long*, long*, long*, long, float*, float*); 
void refresh update text file(long, long, long): 
void main(void) 
{ 
int K, Q, updat siz. 1, run cnt s O, zero = 0; 
int vmax, vbase, vincr, viewcut: 
long ecard, pcard, scard, countb, countv, countq; 
long qmax, qbase, qincr, querycut: 
float fv, fva. fa, faa, P: 
double timeqm, timesm, timefm; 
char QUERY = 'Q', UPDATE = 'K'; 
char *prm ptr, parameter[10]. *updt ptr, updat rel[10]: 
DELE entr]efl o 9fresult fl, *run rslt: 
prm ptr » &parameter(O]: 
updt ptr - &updat rel[0]: 
opentfiles(&run realt. &cntrl fl, &fresult fl); 
scan dbinfo(&pcard, &ecard, &scard, &vmax, &vbase, &vincr, &qmax, &qbase, 
&qincr): 
WhbibÒe{!feoflentrl f1)) 
{ 
timeqm = timesm = timefm = 0.0; 
countb = county = countq = 0; 
fscanf(cntrl fl, "Xd Xld Xd Xd Xd Xs xs", &viewcut, &querycut, &K,. &Q. 
&updat siz, prm ptr, updt ptr): 
if (run cnt == zero) write file headings(prm ptr. updt ptr, fresult fl, 
run rslt):; 
init test database(viewcut): 
run_cnt++; 


printf(“"\n run # %d\n"”, run cnt): 
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O; i < K; i++) 


u 


for(i 


( 


refresh update text file(pcard, i; updat siz): 
module qm(UPDATE, viewcut, querycut, &timeqm, run rslt); 
module sm(UPDATE, viewcut, querycut, &timesm, run rslt):; 
module fm(UPDATE, viewcut, querycut, &timefm, run rslt): 
) 
for(i x0: 1 «(ON 
( 
module qm(QUERY, viewcut, querycut, &timeqm, run rslt); 
module sm(QUERY, viewcut, querycut, &timesm, run rslt); 
module fm(QUERY, viewcut, querycut, &timefm, run rslt): 
) 
compute avg time(Q, &timeqm, &timesm, &timefm): 
compute fv and fq and P(vmax, vbase, vincr, viewcut, &fv, qmax, qbase, 
qincr, querycut, &fq, K, Q, &P): 
compute table counts(&countb, &countv, &countq, querycut, &fva, &fqa): 
write final result(run cnt, viewcut, querycut, updat siz, countb, countv, 
countq, fv, fva, fq, fqa, P, timeqm, timesm, timefm, 
fresult fl, run rslt); 
exec sql disconnect; 
system("rmingres"); 
) 
close files(&run rslt, &cntrl fl, &fresult f1); 


printf("Aindisconnect completen"): 


) 
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void init test database(int viewcut) 
( 
system("destroydb magdi"): 
System("createdb magdi"): 
System("addingres -B -D64000"); 
exec sql whenever sqlerror stop: 
exec sql connect magdi; 
create tables(): 
create views(viewcut); 
copy base tables(): 
copy semi n full mats(viewcut):; 
create table index(): 


) 


Zoid open files(FILE.**run rslt. FILE **cntrl fl, FILE **fresult f1) 


( 


Ecntrl fl * fopen(cntrlfl, "r"):; 


*fresult fl s fopen(finrslt, "a"); 
*run rslt - fopen(runrslt, "a"): 
PoC Arun esit) ;. ('*entrl fl) ;; (!*fresult f1)) 
( 
printf("\nERROR: control or output files did not open"); 
fcloseall(): 
exec sql disconnect: 
exit(1); 
) 
) 
void close files(FILE **run rslt. FILE **cntrl fl. FILE **fresult fl) 
{ 
int i: 
fprintf(*£résult £1 n"): 
for (1s0:1«BO:1*«) fprintf(*fresult fl1,"*"): 
fclose (*run rslt): 
felose (tentrl_£1): 


fclose (*fresult fl): 


} 
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void scan dbinfo(long* ecard, long* pcard, long* scard, int* vmax, int* vbase, 
int* vincr, long* qmax, long* qbase, long* qincr) 
( 
FILE* db info; 
db info = fopen(dbinfo. “r"); 
if£(1db into) 
{ 
printf(“\NERROR: dbinfo file did not open "): 
fcloseall(): 
exec sql disconnect: 
exit(1): 
) 
fscanf(db info, "Xld €1d xXldWXn", &*ecard. &*pcard, &*scard): 
fscanf(db info, "Xd xd %d\n", &fvmax, &*vbase, &*vincr): 
fscanf(db info, "Xld %ld xld", &*qmax, &*qbase, &*qincr): 
fclose(db info): 
) 
void create tables() 
( 
/* create query modification tables */ 
exec Sql create table posqm 
(e num integer2, snum integer2. level integerl, keyno integer2, 
accinfo c86); 
exec sql create table empqm 
(e num integer2, dnum integer2, ename c20, address c70, 
salary integer4, title c30, jobdesc c60): 
exec sql create table skillqm 
(snum integer2, sname c20, stype c34); 
/* create semi-materialization tables */ 
exec sql create table possm 
(e num integer2, snum integer2. level integerl, keyno integer2, 
accinfo c86); 
exec sql create table empsm 
(e num integer2, dnum integer2, ename c20, address c?70, 
salary integer4, title c30, jobdesc c60): 
exec sql create table skillsm 


(snum integer2, sname c20, stype c34); 
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exec sql create table pos prim 
(e num integer2, keyno íinteger2): 
exec sql create table emp prim 
(e num integer2, ename c20, salary integer4): 
/* create full materialization tables */ 
exec sql create table posfm 
(e num integer2. snum integer2, level integerl, keyno integer2, 
accinfo c86); 
exec sql create table empfm 
(e num integer2, dnum integer2, ename c20, address c70, 
salary integer4, title c30. jobdesc c60); 
exec sql create table skillfm 
(snum integer2, sname c20. stype c34); 
exec sql create table full_mat 
(e num integer2, ename c20, salary integer4, keyno integer2); 


) 


void create views(int viewcut) 
{ 
exec sql begin declare section; 
int view_cut; 
exec sql end declare section: 


view cut = viewcut: 
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) 
void create update table() 
( 
exec sql create table update tbl 
(e num integer2, snum integer2, level integerl, keyno integer2, 
accinfo c86); 
exec sql copy table update tbl 
(e _ num = cOcolon, snum= cOcolon, level = cOcolon, 
keyno = cOcolon, accinfo = cOnl) 
from :updatinfo; 
) 
void copy base tables() 
( 
exec sql copy table posqm 
(e num = cOcolon, snum = cOcolon, level = cOcolon, keyno = cOcolon, 
accinfo = cOnl) 
from :posinfo; 


exec sql copy table possm 


(e_num = COcolon, snum = cOcolon, level cOcolon, keyno = cOcolon, 
accinfo = cOnl) 


from :posinfo: 


exec sql copy table posfm 


(e num » cOcolon, snum » cOcolon, level cOcolon, keyno = cOcolon, 
accinfo = cOnl) 
from :posinfo: 

exec sql copy table empqm 
(e num z cOcolon, dnum = cOcolon, ename = cOcolon. address = cOcolon, 


salary = cOcolon, title = cOcolon, jobdesc = cOnl) 


from :empinfo; 
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exec 


exec 


exec 


exec 


exec 


} 


sql copy table empsm 
(e_num = cOcolon, dnum = 
salary = cOcolon, title 
from :empinfo; 

sql copy table empfm 
(e _ num = cOcolon, dnum = 
salary = cOcolon, title 
from :empinfo: 

sql copy table skillqm 
(snum = cOcolon, sname = 
from :skilinfo: 

sql copy table skillsm 
(snum = cOcolon, sname = 
from :skilinfo: 

sql copy table skillfm 
(snum » cOcolon, sname = 


from :skilinfo: 


cOcolon, ename 


= cOcolon, address = cOcolon, 


= cOcolon, jobdesc = cOnl) 


cOcolon, ename 


= cOcolon, address = cOcolon, 


= cOcolon, jobdesc = cOnl) 


cOcolon, stype 


cOcolon, stype 


cOcolon, stype 


void copy semi n full mats(int viewcut) 


( 


exec sql begin declare section; 


int view cut: 


exec sql end declare section; 


view cut = viewcut: 


exec sql insert into pos prim (e num, keyno) 


select e_num, keyno 


from possm 


where level >= :view cut; 


cOnl) 


= cOnl) 


= cOnl) 


exec sql insert into emp prim (e num, ename, salary) 


select e num, ename, salary 


from empsm:; 
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) 


void create table index() 


( 

exec 
exec 
exec 
exec 
exec 
exec 
exec 
exec 


exec 


exec 


exec 


exec 


exec 


exec 


exec 


exec 


exec 


/* 


Sql modify empqm to cbtree on e num; 


sql modify empsm to cbtree on e num; 


sql modify empfm to cbtree on e num; 


sql modify posqm to cbtree on level; 


sql modify possm to cbtree on level; 


sql modify posfm to cbtree on level; 


sql modify emp prím to cbtree on salary: 


sql modify pos prim to cbtree on e num; 


sql modify full mat to cbtree on salary; 


create secondary indexes  */ 

Sql create index empqmdx 
on empqm (e num): 

sql create index empsmdx 
on empsm (e num); 

sql create index empfmdx 
on empfm (e num); 

Sql create index posqmdx 
on posqm (level): 

sql create index possmdx 
on possm (level): 

Sql create index posfmdx 
on posfm (level): 

sql create index e primdx 
on emp prim (salary): 

sql create index p primdx 


on pos prim(e num); 
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exec sql create index f matdx 
on full mat (salary): 
) 
void module qm(char cntrl char, int viewcut, long querycut, double *timeqm, 
FILE "run rsit) 
( 
clock_t tstart = 0, tstop = 0; 
double elap time: 
long tbl cnt = 0; 
exec sql begin declare section: 
int view cut: 
long query cut: 
long qnum: 
char qname[21]: 
long qkeyno: 
exec sql end declare section: 
exec sql declare qm cl cursor for 
select e num, ename, keyno 
from full view 
where salary >= :query cut; 
view cut = viewcut; 
query cut = querycut; 
switch(cntrl char) 
( 
case 'K': 
create update table(): 
exec sql insert into posqm 
select * 
from update tbl; 
exec sql drop update tbl; 
break; 
case 'Q': 
tstart = clock(): 
exec sql open qm_cl: 


exec sql whenever not found goto closeqm cl: 
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while(sqlca.sqlcode == 0) 
{ 
exec sql fetch qm cl 
into :qnum, :qname, :qkeyno; 
/* orinct( \nnumber - xd', qnum). */ 
EDIC 
) 
closeqm cl: 
exec sql whenever not found continue; 
tstop = clock(); 
exec sql close qm cl; 
break: 
default: 
printf("\nIncorrect control character\n"): 
break; 
} 
elap time = (tstop - tstart)/(double)CLK_TCK; 
*timeqm = *timeqm + elap_time; 
write run result('q'. cntrl char, viewcut, querycut, elap time, tbl cnt, 
run_rsi €): 
) 
void module sm(char cntrl char, int viewcut, long querycut. double *timesm, 
FILE *run rslt) 
( 
clock t tstart = O, tstop = 0; 
double elap time; 
long tbl cnt - 0; 
exec sql begin declare section: 
int view cut; 
long query cut: 
long snum; 
char sname[21]: 
long skeyno; 


exec sql end declare section; 
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exec sql declare sm cl cursor for 
select e num. ename, keyno 
from sm view 
where salary >= :query cut; 
view cut = viewcut: 
query cut = querycut; 
switch(cntrl char) 
{ 
case 'K': 
create update table(): 
exec sql insert into possm 
select * 
from update tbl; 
tstart s clock(): 
exec sql insert into pos prim 
select e num, keyno 
from update tbl 
where level >= :view cut: 
tstop = clock(): 
exec sql drop update tbl: 
break; 
case 'Q': 
tstart - clock(): 
exec sql open sm cl: 
exec sql whenever not found goto closesm cl: 
while (sqlca.sqlcode == 0) 
{ 
exec sql fetch sm cl 
into :snum, :sname, :skeyno; 
/* printf("\nsnum = Xd", snum): */ 
tbl_cnt++; 
} 
closesm cl: 
exec sql whenever not found continue; 
tstop = clock(): 
exec sql close sm cl: 


break: 
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default: 
printf("NNincorrect control character Mn"): 
break; 
) 
elap time = (tstop - tstart)/(double)CLK TCK: 
*timesm = *timesm + elap time: 
write run result('s', cntrl char, viewcut, querycut. elap time, tbl cnt, 
run rslt): 
) 
void module fm(char cntrl char, int viewcut, long querycut. double *timefm, 
FILE frun rsit) 
{ 
clock t tstart = 0, tstop = 0; 
double elap time; 
long gent = 0; 
exec sql begin declare section; 
int view cut: 
long query cut: 
long tbl cnt; 
long fnum: 
char fname[21): 
long fkeyno: 
exec sql end declare section: 
exec sql declare fm cl cursor for 
select e num, ename, keyno 
from full mat 
where salary »- :query cut: 
view cut = viewcut; 
query cut = querycut: 
switch(cntrl_ char) 


{ 
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case 'K' 
create update table(): 
exec sql insert into posfm 
select * 
from update tbl; 


tstart - clock(): 


tstop = clock(): 
exec sql drop update tbl: 
break; 
case 'Q': 

tstart - clock(): 

exec sql open fm cl; 

exec sql whenever not found goto closefm cl: 
while (sqlca.sqlcode == 0) 

{ 

exec sql fetch fm_cl 

into :fnum, :fname, :fkeyno: 

/* printf("Mn fnum s Xd", fnum); */ 
qcnt++; 

) 

closefm cl: 

exec sql whenever not found continue; 
tstop = clock(): 

exec sql close fm cl; 


break: 
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default: 


printf("\Nincorrect control character\n"): 


break: 
} 
elap time = (tstop - tstart)/(double)CLK_TCK; 
*timefm = *timefm * elap time; 


exec sql select rowtot = count(e num) 

into :tbl cnt 

from full mat 

where salary >= :query cut; 
write run result('f', cntrl char, viewcut, querycut, elap time, tbl cnt, 

run rslt): 
) 
void write file headings(char* param, char* updt tbl. FILE* fresult fl, 
FILE* run rslt) 
( 
time t today t: 
time(&today t): 
fprintf(fresult fl."MAn Xs - FINAL RESULTS (vsgxpdp7) -\n". ctime(&today t)): 
fprintf(fresult fl."\n The X*s is the parameter being tested", param): 
fprintf(fresult fl."\n The %s table is the table being updated", updt tbl): 
fprintf(run rslt."An %s - RUN RESULTS (vsgxpdp7) -Mn", ctime(&today t)): 
fprintf(run rslt,"Mn The X*s is the parameter being tested", param): 
fprintf(run rslt,"Mn The Xs table is the table being updatedWMn", updt tbl1): 
) 
void write run _result(char strat. char cntrl_char, int viewcut. long querycut. 
double elap time. long tbl_cnt, FILE *run_rslt) 
( 
printf("\n%cm cc=%c vc=%d qc=%1l1d et=%.21f tc=%ld", strat, cntrl char, 
viewcut, querycut, elap time, tbl cnt): 

fprintf(run rslt."\n%cm cc=%c vc=%d qc=%1d et=%.21f tc=%ld",strat, 


cntrl char, viewcut, querycut, elap time, tbl cnt): 
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void write final result(int run, int viewcut, long querycut, int updt siz, 
long countb, long countv, long countq, float fv, 
float fva, float fq. float fqa, float P, 
double timeqm, double timesm, double timefm, 
ETLERSEneeult fl SEILE *runorslt) 
{ 
printf("\n\nRUN# %d, VCUT= %d, QCUT= %ld, #TUP= Xd, BASE- %ld, VIEW= %1ld,"\ 
" QUERY= %ld", run, viewcut, querycut, updt siz., countb, county, 
countq): 
printf("\nFV= %.2f, FVA- Xf, FQ- X*.2f, FQA- Xf P- %.2f".fv, fva, fq. 
fqa, P): 
printf("\nTIMEQM= %.31f sec, TIMESM= %.31f sec, TIMEFM= %.31f sec\n", 
timeqm, timesm, timefm): 
fprintf(fresult_f1."AniNRUN% %d, VCUT= %d, QCUT= $ld, #TUP= %d, BASE= %ld,"\ 
" VIEW= %ld, QUERY= %ld", run, viewcut, querycut, updt siz, countb, 
countv, countq): 
fprintf(fresult_ fl."\NFV= %.2f, FVA= Xf, FQs %.2f, FQA= %f P= %.2f",fv. fva, 
fq. fqa, P): 
fprintf(fresult_f1,"ANTIMEQM= %.31f sec, TIMESM= %.31f sec, TIMEFM= %.31f"\ 
" sec\n",timegm, timesm, timefm); 
fprintf(run _ rslt."\n\NRUN# %d, VCUT= %d, QCUT= %ld, #TUP= %d, BASE= %ld,"\ 
" VIEW= %ld, QUERY= %ld", run, viewcut, querycut, updt siz, countb, 
countv, countg): 
fprintf(run_rslt,"ANFV= %.2f, FVA= %f, FQ= %.2f, FQA= Xf P= %.2f", fv, fva, 
fq. fqa. P): 
fprintf(run rslt,"\NTIMEOM= %.31f sec, TIMESM= %.31f sec, TIMEFM= %.31£"\ 
" secin",timeqm, timesm, timefm); 
) 
void compute avg time(int Q, double *timeqm, double *timesm, double *timefm) 
( 
If(Q > 0) 
( 
*timeqm = *timeqm / (double)Q: 


*timesm - *timesm / (double)Q: 


*timefm *timefm / (double)Q: 


) 
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else 
( 
printf("\n\NERROR: dividing times by 0, **** results are VOID ****\n"); 
} 
} 
void compute fv and fq and P(int vmax, int vbase, int vincr, int vcut, 
float *fv,long qmax. long qbase, long qincr, 


long qcut, float "fq, int K. "int Q, float *P) 


*fv - (float)(vmax) - ((float)(vcut - vbase) / (float)(vincr)): 
*fv - (*fv + (float)(vincr) / (float)(vincr)) / (float)(vmax): 


* fa 


(float)(qmax) - ((float)(qcut - qbase) / (float)(qincr)): 
*fq - (*fq * (float)(qincr) / (float)(qincr)) / (float)(qmax): 
*P s (float)(K)/(float)(K + QJ): 
) 
void compute table counts(long *countb, long *countv, long *countq. 
long querycut, float *fva. float *fqa) 
{ 
exec sql begin declare section: 
long query cut: 
long tbl ent; 
exec sql end declare section; 
query cut = querycut; 
exec sql create table base mat 
(e num integer2, ename c20, salary integer4, keyno integer2): 
exec sql insert into base mat (e num, ename, salary, keyno) 
select empfm.e num, empfm.ename, empfm.salary. posfm.keyno 
from empfm, posfm 
where empfm.e num = posfm.e num; 
exec sql select rowtot = count(e num) 
into :tbl cnt 
from base mat; 
*countb = tbl cnt; 
exec sql select rowtot = count(e num) 
into :tbl-ent 
from full mat; 


*countv » tbl cnt; 
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exec sql select rowtot - count(e num) 
into :tbl cnt 
from full mat 
where salary >= :query cut: 
peountq = tbl ent: 


*fva = (float) ((double)*countyv / (double) *countb): 


*fda (float)((double)*countq / (double)*countv): 
exec sql drop base mat; 
) 
void refresh update text file(long card. long i. long update siz) 
( 
long update base: 
int num of fields, j, change field = 4; 
char file name[size] = updatinfo, *file ptr: 
FILE *updat fl: 


struct field attrib 


char field type: 

int field width; 

char field info; 

long lower bound: 

int increment; 

long upper bound; 

struct field attrib *next; 

E 

struct field attrib *first field = NULL; 
struct field attrib *current field = NULL; 
struct field attrib *print_ptr = NULL; 
file ptr = &file name(0]; 


update base = (i * update siz) + card +1; /* compute new key base number */ 


PES 


/** Read old control input for data generation program  **/ 
updat_ fl = fopen(update file, "r"); 
if('updat tl 
( 
printf("\NERROR: update control file did not open to read"): 
fcloseall(): 
exec sql disconnect; 
exltil):; 
} 
fscanf(updat fl. "%*d\n"); 
fscanf(updat fl, "%d\n". &num of fields): 
fscanf(updat fl. “%*s\n"); 


for (j = 1: j <= num of fields; j++) 


first field - (struct field attrib*)malloc(sizeof(struct field attrib)Lh 
if (first field == NULL) printf("ANERROR: Memory did not allocate!!!"): 
current field - first field; 
) 

else 


( 


current field-»next =(struct field attrib*)malloc(sizeof(struct field attrib)): 
current field * current field-»next; 
) 
current field-»next = NULL; 
fscanf(updat fl. "WMn&cWMn", &current field-»field type): 
fscanf(updat fl, "%d\n", &current field-»field width); 
fscanf(updat fl, FLACA. &current field-»field info): 
fscanf(updat fl, "%ld\n". &current field-»lower bound); 
fscanf(updat fl, "XdWMn", &current field-»increment); 
fscanf(updat fl, "XldWMn", &current field-»upper bound): 
if (j == change field) /* changing base for keyno field */ 
( 
current field-»lower bound » update base; 
) 
) 


fclose(updat f1); 
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/** write updated control input for data generation program  **/ 


updat fl - fopen(update file, "w"): 
Drüo'updat fl) 
( 
printf("NNERROR: update control file did not open to write"); 
fcloseall(): 
exec sql disconnect; 
exit(1): 
) 
fprintf(updat fl, “"%ld\n". update siz); 
fprintf(updat fl, "%d\n", num of fields): 
fprintf(updat fl, "x$", file ptr): 
print ptr s first field; 
while(print ptr != NULL) 
( 
fprintf(lupdat.fl, "An inse no Print ptr-»field type): 
fprintf(updat fl, "XdWMn", print ptr-»field width): 
fprintf(updat fl, "€cWMn", print ptr-»field info); 
fprintf(updat fl, "€*ldWMn". print ptr-»lower bound): 
fprintf(updat fl, "€XdMn", print ptr-»increment): 
fprintf(updat fl, "%ld", print ptr->upper bound); 
print ptr - print ptr-»next:; 
) 
fclose(updat_ fl): 


system("datagen"); 


) 
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